Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Chi-square distribution based similarity join query algorithm on high-dimensional data

MA Youzhong, JIA Shijie, ZHANG Yongxin

Journal of Computer Applications 2016, 36 (7): 1993-1997. DOI: 10.11772/j.issn.1001-9081.2016.07.1993

Abstract （619）

PDF （829KB）（355）

Save

To deal with the curse of dimensionality and costly computation problems existed in high-dimensional similarity join query, the high-dimensional data were mapped to low-dimensional space based on p-stable distribution. According the definition of chi-square distribution, a theorem was proved:if the distance of two points in low-dimensional space is greater than kε, the probability that the distance of two points in original space is greater than ε has a lower bound. So the effective filtering can be performed at relative low cost in the mapped space. A novel chi-square distribution-based similarity join query algorithm on high-dimensional data was proposed. In order to further improve the query efficiency, another similarity join query algorithm based on double filtering was also proposed. Comprehensive experiments were performed. The experimental results show that the proposed approaches have good performance. The recall of the chi-square distribution-based similarity join query algorithm is larger than 90%. The double filtering based similarity join query algorithm can further improve the efficiency, but it will lose some recall rate. Chi-square distribution based similarity join query algorithm is suitable for the query tasks which are critical of the query performance but not critical of the recall; otherwise, the similarity join query algorithm based on double filtering is favorable.

Reference | Related Articles | Metrics